Creating labels for Machine Learning - Python Programming for Finance p. 11 - Key Error XOM_2d

by: ALB_T800, 6 years ago


Went through the tutorial for P11 Creating labels for Machine Learning and received a Key Error 'XOM_2d'  here is my code down below.  I am missing anything critical?  I appreciate any help as I would like to continue with the Python lesson.

Code:    
[from collections import Counter
import numpy as np
import pandas as pd
import pickle


def process_data_for_labels(ticker):
    hm_days = 7
    df = pd.read_csv('sp500_joined_closes.csv', index_col=0)
    tickers = df.columns.values.tolist()
    df.fillna(0, inplace=True)

    for i in range(1, hm_days + 1):
        df['{}_{}d'.format(ticker, i)] = (df[ticker].shift(-1) - df[ticker]) / df[ticker]

        df.fillna(0, inplace=True)
        return tickers, df


def buy_sell_hold(*args):
    cols = [c for c in args]
    requirement = 0.02
    for col in cols:
        if col > requirement:
            return 1  # BUY
        if col < - requirement:
            return -1  # SELL
        return 0  # HOLD


def extract_featuresets (ticker):

    tickers, df = process_data_for_labels(ticker)

    df['{}_target'.format(ticker)] = list(map(buy_sell_hold,
                                              df['{}_1d'.format(ticker)],
                                              df['{}_2d'.format(ticker)],
                                              df['{}_3d'.format(ticker)],
                                              df['{}_4d'.format(ticker)],
                                              df['{}_5d'.format(ticker)],
                                              df['{}_6d'.format(ticker)],
                                              df['{}_7d'.format(ticker)]))

    vals = df['{}_target'.format(ticker).values.tolist()]
    str_vals = [str(i) for i in vals]
    print('Data spread:', Counter(str_vals))

    df.fillna(0, inplace=True)
    df = df.replace([np.inf, -np.inf], np.nan)
    df.dropna(inplace=True)
    df_vals = df[[ticker for ticker in tickers]].pct_change()
    df_vals = df_vals.replace([np.inf, -np.inf], 0)
    df_vals.fillna(0, inplace=True)

    X = df_vals.values
    y = df['{}_target'.format(ticker)].values

    return X, y, df


extract_featuresets('XOM')]



You must be logged in to post. Please login or register an account.



The problem lies within the process_data_for_labels function, I was doing the same thing. You currently have df.fillna(0, inplace = True) and return tickers,df within your for loop.

-plantand000 5 years ago

You must be logged in to post. Please login or register an account.